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ABSTRACT 


Since MOOC is suffering high dropout rate, researchers try 
to explore the reasons and mitigate it. Focusing on this task, 
we employ a composite model to infer behaviors of learners 
in the coming weeks based on his/her history log of learning 
activities, including interaction with video lectures, partici- 
pation in discussion forum, and performance of assignments, 
etc. 


The prediction accuracy of our proposed model outperforms 
related methods. Besides, we try combining the model with 
suggested interventions, such as sending reminder emails to 
at-risk learners. Future work, which is currently underway, 
will evaluate its influence on mitigating dropout rate. 
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1. INTRODUCTION 


Recently, online education, for which landmark concept is 
MOOCs (Massive Open Online Courses), has become a new 
global craze, bringing several MOOC platforms including 
EdX, Coursera, and Udacity, etc. Due to the freedom of 
ime and place learning at MOOCs, a large scale of learners 
has been benefit from this new form of online learning. A 
ypical course of MOOC lasts for 6-12 weeks, with learners of 
diverse backgrounds and major field. Besides, MOOC learn- 
ers may have different intentions and motivations, causing 
heir extents and leave for various reasons. 


Despite the increasing popularity of MOOCs, the extremely 
low rate of completion has been considered from the begin- 
ning. Drop-out is concerned as one of the most critical prob- 
lem of MOOCs. Drop-out indicates situations that a student 
registers a course, watches course materials, or even attends 
the quizzes, but eventually quits without attending the fi- 
nal test. It has been researched that an average completion 
rate of MOOCs comes as low as 7 percent, ranging from 0.8 
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percent in Princeton’s (History of the World since 1300), to 
19.2 percent in the ”Functional Programming Principles in 
Scala” course [7]. MOOC platforms are facing a concerning 
issue due to a high learners’ dropout rate. 


Thus, identifying at-risk learners by predicting their dropout 
probability thus becomes timely important, given that early 
prediction can help instructors provide proper support to 
those learners to retain their learning interests aiming at 
guaranteeing them a regular process of study without do- 
ing a crash job or even dropout. Addressing this task, we 
focus on predicting learners’ state for the next consecutive 
two weeks. We particularly formulate this issue as a multi- 
classification problem, and develop a Stacked Sparse Auto- 
encoder (SSAE)+Softmax model to solve it. Essentially, our 
model has several advantages. First, it incorporates multi- 
ple features based on characterizing learners’ weekly engage- 
ments on the MOOC platform. Second, it discovers correla- 
tions between observed explanatory features. The new com- 
pressed feature representation transformed by SSAE per- 
forms better than the previous one, based on the input of 
classifiers. Third, the model considers both the current and 
previous states to estimate the next states, which makes it 
more flexible to model students’ dynamics. 


By training a model to identify at-risk students, we can ap- 
ply this model on online MOOC platforms, enabling it to 
calculate students’ at-risk-rate regularly and send emails to 
them automatically. Hopefully some of these at-risk stu- 
dents will continue their learning. 


We make contributions in this paper as follows: 


1. We employ different composite models that incorporate 
multiple features to infer behavior in the coming weeks based 
on weekly history of learning data. The model is an end-to- 
end neural network model, which means it can be trained as 
a whole. Our results indicate that model of SSAE+Softmax 
performs best and achieve higher AUC score consistently, 
which is superior to the baseline SVM model. 


2. We try combining the model with suggested interventions 
such as sending reminder emails to at-risk learners. Though 
we do not conduct real experiments of sending emails, the 
paper proposes a preliminary framework of applying exper- 
imental results to determining to whom reminder emails 
should be sent and when to send. 
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3. We explore to what extent each single feature can influ- 
ence dropout probability and try to cluster dropout learn- 
ers by employing k-means clustering algorithm, proving that 
features extracted from course engagements are effective in- 
dicators of which class a low-performing learner belongs to 
separated by their pattern of behaviors. Future work will 
shade light into the relationships between behavior patterns 
of learners and reasons why they quit the course. 


The rest of this paper is organized as follows. Section 2 de- 
scribes the related work. Section 3 presents the description 
of the dataset and features derived from the dataset. Section 
4 introduces our model in detail. Experimental results and 
discussion are presented in Section 5 and 6. Finally, Section 
7 concludes our work in this paper. 


2. RELATED WORKS 


Mitigating MOOC dropout rate is essential for boosting the 
values of MOOCs, thus the mechanisms that can predict 
student dropout become increasingly important. 


Some exploratory analysis suggests that student behavior 
in the discussion forum helps predict attrition. Yang et 
al. [6] present a foundation for research investigating the 
social factors that affect dropout along the way during par- 
ticipation in MOOCs. To operationalize these factors, they 
define metrics related to posting behavior (thread starter, 
post length, content length) and social positioning (posts & 
replies) within the resulting reply network. Similarly, some 
researchers (Ramesh et al. [8]) explore other aspects of dis- 
cussion forum such as viewing posts, sentiment. This per- 
spective provides a potentially valuable source of insight for 
design of MOOCs that may be more conducive to social en- 
gagement that promotes commitment and therefore lower 
attrition. It is restrictive in application because it mainly 
lowers attrition of learners who drops out mainly because of 
hard interpersonal connection foundation online. 


Many researchers aim at modeling learning behaviors over 
duration of weeks. Their pursuit is to extract significant fea- 
tures by parsing the clickstream file where each line repre- 
sents a web request. These effective features include lecture 
interaction features, forum interaction features, assignment 
features [1-4,11], which capture the activity level of learners. 


In terms of applied models, Kloft et al. [5] explore support 
vector machines (SVM) to predict the state of learners in 
the later phases of a course. Balakrishnan et al. [2] quantize 
the feature space into a discrete number of observable states 
that are integral to a Discrete Single Stream HMM. Fei et 
al. [9] propose recurrent neural network (RNN) model with 
long short-term memory (LSTM) cells. 


3. DATA SET AND FEATURE SET 
3.1 Dataset 


The learner activity log data came from a publicly held 
data mining competition called KDD CUP 2015. It includes 
79186 learners, each of whom enrolled in at least one course 
of the whole set of 39 courses. In total, the clickstream data 
includes 8,157,277 log records and the longest lifetime of en- 
rollment is 6 weeks. Most of the data is user activity log 
data and course structure data. 
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3.2 Feature set 

As stated above, our goal is to estimate the probability that 
a student stops engaging with a course for the next two 
weeks, given her/his learning activities up to the current 
time step. 


The dropout probabilities are closely related to learners’ en- 
gagements to courses, which are mainly characterized by 
design of forum, lecture and assessment features. To ex- 
press the time-varying behaviors of learners, we extract 17 
typical features of each week t for each learner i, denoted 
as vector ai? € R'7, as presented in Table 1. It can be 
noticed that, features we selected are vital but highly cor- 
related with each other, and we will introduce a model to 
cancel this redundancy. 


Feature Description 
f1-f3 Number of posts in discussions, videos watched, 
problems attempted in week t respectively 
£4-f6 Total number of discussions made, videos 
watched, problems attempted by week t 
f7-f9 Average number of discussions, videos, 


problems attempted per week by week t 


f10-f12 | Average number of discussions, videos, 

problems attempted per session in week t 

f13 Sum of number of another activities (navigate, 
access, page close, wiki) in week t 

f14 Total number of activities in week t 

f15 Total number of active days in week t 

f16 Total number of time consumption in week t 

f17 Total numbers of sessions in week t 


Table 1: List of features derived for week t 


3.2.1 Interactions with forums 

A MOOC forum provides a platform to facilitate the com- 
munication between learners and lecturers. The more ac- 
tively the learners interact with their partners, the more a 
learner feels she/he belongs in the course learning and the 
more likely she/he is to complete the learning tasks. Some 
features, such as viewing a post, receiving a reply, following 
a thread and up-voting, are strong indicators of engagement 
and sense of community [6,7]. 


3.2.2 Interactions with lectures 

Because the lecture videos are the most important learn- 
ing resource for the learning participants, the video playing 
should be investigated, as done by other researchers. Among 
these works, Kim et al. {1] explored some click actions when 
watching videos. These behaviors can be classified into six 
types: skipping, zooming, playing, replaying, pausing, and 
quitting. 


3.2.3 Interactions with assignments 

It is reasonable to hypothesize that an active and engaged 
student would monitor their assignment a few times every 
week because material is released and due on a weekly ba- 
sis. When monitoring this week by week, we can roughly 
estimate how far up-to-date a student is with a course. It 
is acknowledged that if a learner falls behind too much, it 
is hard to catch up and thus determination to complete is 
lost [2]. 
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Furthermore, we observe from the user activity log data 
whether the learners are active in session, as the data con- 
tain multiple records in quick succession. We define the 
elapsed time of two separate sessions as 45 minutes. If the 
gap between a learner’s two consecutive operation is more 
than 45 minutes, we assume that the learner quit and logged 
in again. 


Consequently, for current week t, we obtain a sequence of 
(x 2), van, DO) for each learner i across t weeks and the 
corresponding sequence of dropout labels (y? ; yw? 2a y?,). 
If there are activities associated with student 7 in the coming 
week, the dropout label in week t is assigned as y;,(t) = 0, 
otherwise, y,(7) = 1. Notably, all features should be cen- 
tered and normalized to unit standard deviation (mean of 0 


and variance of 1). 


4. OUR MODEL 


4.1 Feature Extractor: Stacked Sparse Autoen- 


coder (SSAE) 


Now suppose that we have extracted weekly features from 
user activity log record, we employ a model named Stacked 
Sparse Autoencoder (SSAE) to discover high level represen- 
tation of input features and correlations among them. In 
this part, we aim to produce a better feature representation 
that can show patterns of behavior for learners. 


Autoencoder neural networks are a serial of models which 
can re-represent features by encoding them into a high level 
representation using a set of parameters and decode it back 
to its original values using another set of parameters. A 
sparse autoencoder neural network consists of an input layer, 
a hidden layer and an output layer, whose size of hidden 
layer is greater than its input layer. The network structure 
is presented in Figure 1. 


hidden layer 


input features restored features 


Figure 1: Network Structure 


Formally, let the vector of input layer be the features of 
learner 7 extracted from weekly history of learning behav- 
ior features. We train the network to minimize the diver- 
gence between the input layer and the output layer, i.e., 
hwo(x?) a a, After the model goes into convergence, 
which means it achieves a minimal difference between input 
features and output values, the hidden layer learns a new 
representation of the input. The numbers and dimensions 
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of the hidden layer controls the complexity of the network 
and requires parameter values tuning to determine its opti- 
mal value. Notably, the new features that the hidden layer 
represents will be as the input of a classifier. To train this 
autoencoder network, we apply back-propagation algorithm 
to minimize overall cost function as follows: 


Jeparse(W,b) = J(W,b) +8 W-W 


Where J (W, b) is calculated by two parts: an average sum- 
of-squares error and penalty term that helps prevent over 
fitting. ‘*‘W-W means a sum of every element in matrix 
which is the element wised multiple of W. (6 represents 
weight of the sparsity penalty term. 


Here we do not introduce the details; computational details 
can be found in [10]. 


In order to generate more general (higher-level-presented) 
features, we use a method called stacked to enrich capacity of 
our model. We train an autoencoder first and use its features 
as the input and output of another autoencoder. Thus we 
get a more abstract representation of original features which 
can be more suitable for describing learners’ inner condition. 


Compared with other methods like PCA, the neural net- 
work based SSAE is more strong. For most cases, relations 
between meta features are complex and can not be repre- 
sented by simple functions like linear functions, thus tradi- 
tional methods are not able to separate them well. However, 
neural networks have the ability to fit any function as long 
as it is given enough capacity(e.g. enough depth of layers of 
amount of cells), which ensures it to project meta features 
in an independent orthogonal linear space. 


4.2 Sequenced feature combiner: RNN 

A RNN (or Recurrent Neural Network) is a class of artifi- 
cial neural networks dealing with sequence data. It takes 
sequenced data step by step, and generates an output ac- 
cording to all previous inputs on every step. A basic RNN 
with one hidden layer is shown in Figure 2. 


Output Layer 
Hidden Layer 


Input Layer 


Figure 2: Basic RNN Structure 


Formally, RNN is a function , where h is the hidden status 
(memory) of hidden units, and D is the size of input vec- 
tor and L is the size of the output vector. The memory h 
changes every time while giving new inputs at each step. 


The input vector of RNN is the high level representation 
generated by SSAE introduced in part 2. We aim to get a 
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good feature representation, which can contain all learners’ 
event histories within a fixed-length vector, to make predic- 
tion and classify dropout learners by his/her reason. 


For a simple RNN, it has parameters (Wi, Un, Wy, bn, by), 
where W, controls what to absorb to memory from input 
features, and U; determines what to remember and what to 
forget from the last memory status, and W, sets the output 
value, and b;, and by are biases who make a global offset to 
both hidden status and output value. 


The computational formula of this kind of RNN is shown 
below: 


he = on(Wnrae + Unhe-i + bn) 
Yt = Oy(Wyhz + by) 


where x; and y; represents input features and output vector 
at time t, and hz is the memory hold by RNN. Here, on and 
oy can be the same or different activation functions. Typi- 
cal choices of activation functions are the sigmoid function 
and tanh function. Particularly, we choose tanh as activa- 
tion function for both of the formulas. We will apply tanh 
in this paper as it typically yields to faster training (and 
sometimes also to better local minima). The operation tanh 
is calculated as follows: 


e—e” 


tanh(x) = ee 
ev + e-* 


We do not apply an LSTM used by other researchers [8] 
because of some reasons. An LSTM is a special kind of 
RNN who has the ability of forgetting, which means it can 
determine what to remember and when to forget its memory 
while getting new inputs, however, a simple RNN can only 
remember all its inputs. We think that, for a sequence no 
longer than six, forgetting should not be accepted. Besides, 
simple RNN requires less calculated quantities which makes 
it more suitable for a large scale online service. 


4.3 Classifier 
4.3.1 Support Vector Machine(SVM) 


Some prior work mentioned in the related work inspires us 
to employ SVM to predict the learning state in the next 
consecutive two weeks. The SVM computes an affine-linear 
prediction function based on maximizing the margin of pos- 
itive and negative examples: 


1 
(w, b) Sargmin,b5||wl|” 
+ CXL maz(0,1— yi(< w,2 > +6)) 


After extracting features, we try to predict by using SVM 
and compare with results from Softmax. As there is distinct 
difference between dropout users and non-dropout users, 
therefore, we use the method of random sampling to con- 
fine the amount of these users into a comparatively small 
one. With this done, the model we gain will not cause over- 
fitting to either classification. 


With learning feature of current week obtained in ’Feature 
set’ Section as input, we apply SVM to predict whether to 
drop out at the end of this week. Three Kernel Functions: 
linear, rbf and mlp are tried, and the prediction accuracy is 
estimated via 5-folds cross validation. 
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4.3.2 Softmax Regression 

In the softmax regression setting, we are interested in multi- 
class classification (as opposed to only binary classification). 
It is expected to classify learners into three cases, which 
can be represented as {(0,0), (0,1), (1,1)}, where 1 means 
dropout, and the first number depends on whether to drop 
out after one week, the latter indicates results after two 
weeks. In this case, the label set can take on 3 different val- 
ues, letting the predicted outcome for i-th learner € {1, 2, 3}. 


We aim to estimate the probability of the class label taking 
on each of the 3 different possible values of each learner. 
Thus, our hypothesis will output a 3-dimensional vector 
(whose elements sum to 1) giving us our estimated 3 prob- 
abilities. Concretely, our hypothesis takes the form: 


ete) 


[ey = te; 4) 
ho(2) = | p(y = Ix; )) = 
(y = 3|2; 0) 


1 oF 2 


———r 7 _:~+WIe 
3 oF g(%) 
y j=l e/ eg 2) 


Where 61,602,603 € R” represent model parameters of soft- 
max, and Deer es 2) generalizes the probability distribu- 
tion, leading to the sum of all the probability is 1. 


5. EXPERIMENTS 

5.1 AUC Score 

We can observe from the KDD cup’s label set that the labels 
are displayed with 79% positives and 21% negatives. Due to 
class imbalance phenomenon, accuracy is not a good metric. 
Instead, Area under receiver operating characteristic curve 
(ROC AUC) is the main metric we use to do parameter tun- 
ing and model selection. Furthermore, AUC measures how 
likely a classifier can correctly discriminate between positive 
and negative samples. 


Week 1 | Week 2 | Week 3 | Week 4 | Week5 
SSAE+ 0.924 0.895 0.887 0.803 0.754 
Softmax 
SSAE+ 0.894 0.867 0.849 0.784 0.729 
SVM 
SVM 0.831 0.826 0.817 0.749 0.698 


Table 2: AUC comparison of SSAE+Softmax, SSAE+SVM, 
SVM 


Table 2 presents the average AUC scores across weeks by 
applying two different classifiers (Softmax, SVM). The re- 
sults indicate that the models that employ SSAE to discover 
correlations among initial features extracted from dataset, 
such as SSAE + Softmax, SSAE + SVM, are more com- 
petitive. They are superior to the baseline SVM model and 
achieve higher AUC score consistently. For instance, for the 
first week, the AUC score of SSAE+SVM is 0.894, which is 
7.58% improvement relative to that of SVM. 


Specifically, we can observe that our proposed model SSAE 
+ Softmax outperforms the other models across different 
weeks. The observation implies that Softmax performs con- 
sistently better than SVM in terms of classifying a learner’s 
previous states and predicting whether he will drop out. 


More notably, the AUC score decreases with increasing life- 
time of the course. We infer that there might be more un- 
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certainties related with dropout behavior that our model 
could not discover only from weekly history records. Ex- 
ternal forces such as lack of free time may result in more 
complex patterns of behavior. For instance, a learner may 
leave suddenly at week 4, while all statistical features of the 
previous three weeks strongly indicate he is not inclined to 
drop out. 


5.2 Confusion Matrix 

In this two class classification problem, the confusion ma- 
trix is a matrix with 4 entries, true positive(TP), false neg- 
ative(FN), false positive(FP), and true negative(TN). 


Precision = ek aes 
~ TP+FP 
TP 
ll = TruePositi te = ———____ 
Reca ruePositiveRate TP1LFN 


Fl=2x Precision x Recall 
Precision + Recall 


The comparisons of metric mentioned above are presented 
in Table 3. Model of SSAE+Softmax outperforms the other 
models consistently, proving good implement of the predic- 
tion task. It is convincing that the results across weeks lay 
a foundation to identify patterns of behavior and suggest 
interventions for inactive learners. 


Model Precision | Recall | F1 score 
SSAE-+Softmax 0.891 0.942 0.916 
SSAE+SVM 0.873 0.907 0.890 
SVM 0.854 0.887 0.870 


Table 3: Performance comparison of SSAE+Softmax, 
SSAE+SVM, SVM 


6. DISCUSSION 


Experimental results of a real-world dataset demonstrate 
that dropout probability is consistently predicable across 
weeks for different students. The next step in applying the 
newly proposed model (SSAE+Softmax) to MOOC plat- 
forms aims to mitigate dropout rate by suggesting inter- 
ventions, such as sending reminder emails, with the goal of 
informing at-risk learners to retain interests. 


Email is a very cheap medium to reach learners and create 
awareness quickly. Our proposed model will contribute to 
determining to whom an email should be sent and when to 
send. Identifying at-risk learners precisely avoids bombard- 
ing active learners with unnecessary emails and at the same 
time informs them in time to call back as many of them as 
possible. 


Here we only present a preliminary framework for sending 
reminder emails. Specifically, at the end of week t, first, 
we extract weekly feature vectors for t weeks and employ 
SSAE+Softmax to predict future states yz and y:41. Then, 
we determine a candidate set of potential at-risk learners 
who satisfy yz=1 and yz+1 = 1 where y: means status of 
the next week. Finally, we observe her/his behavior in the 
coming week t + 1 for every selected learner. If the ’at risk’ 
state is confirmed (y; = 1), the platform will send reminder 
emails at the end of week ¢t + 1 immediately. 
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Although the experiments presented in this paper are lim- 
ited to KDD Cup, we plan to augment our model and eval- 
uate the effectiveness of sending reminder emails in a real 
MOOC platform established by our university. Future work 
applying this model is currently underway and the idea for 
sending emails will be improved step by step. 


With features observed as stated in Section 3, we finish the 
analysis of predicting dropout based on model mentioned 
in Section 4. After gauging the goodness of model perfor- 
mance, it is persuadable that we have the ability of pre- 
dicting and diagnosing dropout. In the following part, we 
analyze how each feature could influence final dropout prob- 
ability by conducting sensitivity analysis, and try to cluster 
dropout learners to figure out their patterns of behavior by 
applying k-means algorithm. 


In order to make data comparable, we separate user events 
by different courses and take the course with the most stu- 
dents (which is also the one with the most accomplished 
students) as our studying example. First, we try to find 
out standard learner behaviors of those who accomplish the 
course with a good quality. We simply take all non-dropout 
students’ event logs and take an average on each of the fea- 
tures, and regard this as a medial requirement for finish 
this course. Next, we change each of the features step by 
step and make prediction using our neural networks with 
fixed parameters, and then we get three outputs representing 
probabilities of dropout in one or two weeks, or not dropout. 
We evaluate a score ranging from 0 to 1 to evaluate quality 
of these features. 


Algorithm 1 Univariate analysis of feature_i 


procedure UNIVARIATEANALY- 
sis(model, input_features) 
standard «+ average(input_feature) 
for rate € (0.5...1.5) do 
features <— standard 
features; < rate x featurei; 
EvaluateDropoutRate(model, features); 
end for 


end procedure 


In Algorithm 1, “input-features” are features of those com- 
plete the courses, and “model” is the model we introduced 
above using SSAE, RNN and Softmax to predict a dropout 
rate, which is regraded as a score ranging from 0 to 1. 


Notably, these features representing learning behaviors are 
classified into two categories: those related to course materi- 
als directly (e.g., watching videos, browsing wiki) and those 
not (e.g., navigate, page_close). We test some features to 
show how they influence a learner’s dropout probability, as 
presented in Figure 3. 


When times of watching video is 60 percent the amount 
of the standard statistic, the dropout probability increases 
sharply from 0.12 to 0.875. In this case, the dropout prob- 
ability for feature page_close increases from 0.52 to 0.774, 
less significantly. It implies that, metrics closely related to 
course materials matter more than the others. Compared to 
indirect activities, times of direct engagements with course 
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Figure 3: Sensitivity Analyses 


materials are highly relevant to probability of accomplishing 
he course. 


We then try to cluster dropout learners by employing k- 
means clustering algorithm, in which we set k = 10. Fea- 
ures extracted in Section 3 are effective indicators of which 
pattern of behavior a low-performing learner belongs to. We 
map any feature vector to one of the 10 clusters. There are 
wo clusters whose number of low-performing learners are 
apparently larger than the others. 


Inactive learners belonging to one cluster mentioned above 
preform worse with increasing lifetime of engagements. By 
monitoring their learning behavior in terms of lecture video, 
discussion and assignments, we find the numbers decrease 
week by week significantly. It can be inferred that they 
are putting less and less effort into learning as the course 
continues, which is a great indicator of failing to keep up 
with the pace of the course. 


Inactive learners belonging to another cluster display a com- 
plex pattern of behavior. For instance, they leave the course 
for one or two weeks and then come back to learn. At the 
beginning, these learners display a high level of persever- 
ance and self-discipline. Almost all the statistics demon- 
strate that they have regular patterns of studying, which 
can be confirmed by low dropout probability computed by 
our model. However, they behaved poorly in the coming 
weeks. Specifically, for some learners, the number of video 
watched, discussion made, and problems attempted all reach 
0 suddenly. After some weeks, these learners come back to 
learn. Meanwhile, all learning data reaches the highest in 
comparison with previous weeks. Finally, they don’t take 
exams and drop out. It may be inferred that such learn- 
ers are "trying but not succeeding”, due to the limit of time 
allowance (maybe other external forces). 


In the future, to extend our model, we will send those learn- 
ers predicted to leave the course a survey to find out why 
they are disengaging. We will shade light into the relation- 
ships between behavior patterns of learners and reasons why 
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they quit the course. 


7. CONCLUSIONS 


In this paper, we propose different composite models that 
incorporate multiple features to infer behavior for the next 
two weeks based on features extracted from weekly history of 
learning data. The SSAE+Softmax model achieves a higher 
AUC score consistently, being superior to the baseline SVM 
model. Besides, application of the model including an auto- 
mated email reminder system is under construction. 
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